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WEAKLY DEPENDENT FUNCTIONAL DATA 
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Universite Libre de Bruxelles and Utah State University 

Functional data often arise from measurements on fine time grids 
and are obtained by separating an almost continuous time record into 
natural consecutive intervals, for example, days. The functions thus 
obtained form a functional time series, and the central issue in the 
analysis of such data consists in taking into account the temporal 
dependence of these functional observations. Examples include daily 
curves of financial transaction data and daily patterns of geophys- 
ical and environmental data. For scalar and vector valued stochas- 
tic processes, a large number of dependence notions have been pro- 
posed, mostly involving mixing type distances between cr-algebras. 
In time series analysis, measures of dependence based on moments 
have proven most useful (autocovariances and cumulants). We intro- 
duce a moment-based notion of dependence for functional time series 
which involves m-dependence. We show that it is applicable to linear 
as well as nonlinear functional time series. Then we investigate the 
impact of dependence thus quantified on several important statistical 
procedures for functional data. We study the estimation of the func- 
tional principal components, the long-run covariance matrix, change 
point detection and the functional linear model. We explain when 
temporal dependence affects the results obtained for i.i.d. functional 
observations and when these results are robust to weak dependence. 

1. Introduction. The assumption of independence is often too strong to 
be realistic in many applications, especially if data are collected sequentially 
over time. It is then natural to expect that the current observation depends 
to some degree on the previous observations. This remains true for func- 
tional data and has motivated the development of appropriate functional 
time series models. The most popular model is the autoregressive model of 
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Bosq [14]. This model and its various extensions are particularly useful for 
prediction (see, e.g., Besse, Cardot and Stephenson [11] Damon and Guillas 
[23], Antoniadis and Sapatinas [4]). For many functional time series it is, 
however, not clear what specific model they follow, and for many statistical 
procedures it is not necessary to assume a specific model. In such cases, it is 
important to know what the effect of the dependence on a given procedure 
is. Is it robust to temporal dependence, or does this type of dependence 
introduce a serious bias? To answer questions of this type, it is essential 
to quantify the notion of temporal dependence. For scalar and vector time 
series, this question has been approached from a number of angles, but, ex- 
cept for the linear model of Bosq [14], for functional time series data no 
general framework is available. Our goal in this paper is to propose such 
a framework, which applies to both linear and nonlinear dependence, de- 
velop the requisite theory and apply it to selected problems in the analysis 
of functional time series. Our examples are chosen to show that some sta- 
tistical procedures for functional data are robust to temporal dependence 
as quantified in this paper, while other require modifications that take this 
dependence into account. 

While we focus here on a general theoretical framework, this research has 
been motivated by our work with functional data arising in space physics 
and environmental science. For such data, especially for the space physics 
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Fig. 1. Ten consecutive functional observations of a component of the magnetic field 
recorded at College, Alaska. The vertical lines separate days. Long negative spikes lasting 
a few hours correspond to the aurora borealis. 
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data, no validated time series models are currently available, so to justify 
any inference drawn from them, they must fit into a general, one might say, 
nonparametric, dependence scheme. An example of space physics data is 
shown in Figure 1. Temporal dependence from day to day can be discerned, 
but has not been modeled. 

The paper is organized as follows. In Section 2 we introduce our depen- 
dence condition and illustrate it with several examples. In particular, we 
show that the linear functional processes fall into our framework, and present 
some nonlinear models that also do. It is now recognized that the functional 
principal components (FPCs) play a far greater role than their multivari- 
ate counterparts (Yao and Lee [64], Hall and Hosseini-Nasab [33], Reiss and 
Ogden [51], Benko, Hardle and Kneip [6], Midler and Yao [45]). To develop 
theoretical justification for procedures involving the FPCs, it is necessary 
to use the convergence of the estimated FPCs to their population counter- 
parts. Results of this type are available only for independent observations 
(Dauxois, Pousse and Romain [24], and linear processes, Bosq [14], Bosq and 
Blanke [15]). We show in Section 3 how the consistency of the estimators 
for the eigenvalues and eigenfunctions of the covariance operator extends to 
dependent functional data. Next, in Section 4, we turn to the estimation of 
an appropriately defined long-run variance matrix for functional data. For 
most time series procedures, the long-run variance plays a role analogous to 
the variance-covariance matrix for independent observations. Its estimation 
is therefore of fundamental importance, and has been a subject of research 
for many decades (Anderson [1], Andrews [3] and Hamilton [34] provide the 
background and numerous references). In Sections 5 and 6, we illustrate the 
application of the results of Sections 3 and 4 on two problems of recent 
interest: change point detection for functional data and the estimation of 
kernel in the functional linear model. We show that the detection procedure 
of Berkes et al. [7] must be modified if the data exhibit dependence, but 
the estimation procedure of Yao, Miiller and Wang [65] is robust to mild 
dependence. Section 5 also contains a small simulation study and a data 
example. The proofs are collected in the Appendix. 

2. Approximable functional time series. The notion of weak dependence 
has, over the past decades, been formalized in many ways. Perhaps the 
most popular are various mixing conditions (see Doukhan [25], Bradley [16]), 
but in recent years several other approaches have also been introduced (see 
Doukhan and Louhichi [26] and Wu [62], [63], among others). In time series 
analysis, moment based measures of dependence, most notably autocorrela- 
tions and cumulants, have gained a universal acceptance. The measure we 
consider below is a moment-type quantity, but it is also related to the mixing 
conditions as it considers <r-algebras m time units apart, with m tending to 
infinity. 
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A most direct relaxation of independence is the m-dependence. Suppose 
{X n } is a sequence of random elements taking values in a measurable space 
S. Denote by JF^ =a{. . . , X k -2,X k -i,X k } and JF% = a{X k , X k+1 , X k+2 , ■ ■ •} 
the a-algebras generated by the observations up to time k and after time k, 
respectively. Then the sequence {X n } is said to be m-dependent if for any 
k, the cr-algebras and F^ +m are independent. 

Most time series models are not m-dependent. Rather, various measures of 
dependence decay sufficiently fast, as the distance m between the cr-algebras 
and F^ +m increases. However, m-dependence can be used as a tool to 
study properties of many nonlinear sequences (see, e.g., Hormann [35] and 
Berkes, Hormann and Schauer [8] for recent applications). The general idea 
is to approximate {X n ,n £ Z,} by m-dependent processes 

m > 1. The goal is to establish that for every n the sequence {X^ 1 , m > 1} 
converges in some sense to X n , if we let m — > oo. If the convergence is fast 
enough, then one can obtain the limiting behavior of the original process 
from corresponding results for m-dependent sequences. Definition 2.1 for- 
malizes this idea and sets up the necessary framework for the construction 
of such m-dependent approximation sequences. The idea of approximating 
scalar sequences by m-dependent nonlinear moving averages appears already 
in Section 21 of Billingsley [12], and it was developed in several directions 
by Potscher and Prucha [48]. 

In the sequel we let H = L 2 ([0, l],jBr 0) i] , A) be the Hilbert space of square 

integrable functions defined on [0, 1]. For / E H we set ||/|| 2 = \ f(t)\ 2 dt. 
All our random elements are assumed to be defined on some common proba- 
bility space (f2, A, P) . For p > 1 we denote by L p = L p (£l, A, P) the space of 
(classes of) real valued random variables such that \\X\\ P = (E\X\ p ) l l p < oo. 
Further we let L P H = L P H (Q,A,P) be the space of H valued random variables 
X such that u p {X) = (E\\ X\\p) 1 > p < oo. 

Definition 2.1. A sequence {A" n } G L P H is called L p -rn-approximable 
if each X n admits the representation, 

(2.1) X n = f(e n ,e n -i, . . .), 

where the £i are i.i.d. elements taking values in a measurable space S, and / 
is a measurable function / : 5°° — > H. Moreover we assume that if {e^} is an 
independent copy of {si} defined on the same probability space, then letting 

(2.2) X^ ^ = f(e n ,e n -i, . . . ,e n — m -i r i,£ n _ m ,£ n _ rn _i, . . .), 
we have 

oo 

(2.3) J> p (X m -XL m) )<oo. 

m=l 
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For our applications, choosing p = 4 will be convenient, but any p > 1 can 
be used, depending on what is needed. (Our definition makes even sense if 
p < 1, but then v v is no longer a norm.) Definition 2.1 implies that {X n } 

is strictly stationary. It is clear from the representation of X n and X^ 

that E\\X m - X^\\p = E\\Xi - x[ m) \\P, so that condition (2.3) could be 

formulated solely in terms of X\ and the approximations x[ m \ Obviously 

the sequence {X^ m \n £ Z} as defined in (2.2) is not m-dependent. To this 

end we need to define for each n an independent copy {e^ } of {ek} (this 
can always be achieved by enlarging the probability space) which is then 

used instead of {e' k } to construct X^; that is, we set 

(9 A) Y"( m ) — f (e e , jr -, s-( n ) ^ 

y\ n — j ^t n ,t n _i, . . . ,t n _ m +i,t n _ m ,t n _ m _ 1 , . . .). 

We will call this method the coupling construction. Since this modification 

leaves condition (2.3) unchanged, we will assume from now on that the X^ 

are defined by (2.4). Then, for each m > 1, the sequences {xi m \n€Z} are 

strictly stationary and m-dependent, and each X^ is equal in distribution 
to X n . 

The coupling construction is only one of a variety of possible m-dependent 
approximations. In most applications, the measurable space S coincides with 
H, and the e n represent model errors. In this case, we can set 

(2.5) Xt ] = /(e„, e n -l, • ■ • , £n- m+ i ,0,0,...). 

The sequence {X^\n E Z} is strictly stationary and m-dependent, but 
X^ is no longer equal in distribution to X n . This is not a big problem but 
requires additional lines in the proofs. For the truncation construction (2.5), 
condition (2.3) is replaced by 

(2.6) J2 u p(Xm-Xt ) )<co. 

m=l 

Since E\\X^ ] - xt ] \\ p = E\\X^ ] - X m \\P, (2.6) implies (2.3), but not vice 
versa. Thus the coupling construction allows to study a slightly broader class 
of time series. 

An important question that needs to be addressed at this point is how our 
notion of weak dependence compares to other existing ones. The closest rel- 
ative of L p -m-approximability is the notion of L p -approximability studied 
by Potscher and Prucha [48] for scalar and vector-valued processes. Since 
our definition applies with an obvious modification to sequences with values 
in any normed vector spaces H (especially R or M. n ), it can been seen as 
a generalization of L p -approximability. There are, however, important dif- 
ferences. By definition, L p -approximability only allows for approximations 
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that are, like the truncation construction, measurable with respect to a finite 
selection of basis vectors, e n , ...,e n _ m , whereas the coupling construction 
does not impose this condition. On the other hand, L p - approximability is 
not based on independence of the innovation process. Instead independence 
is relaxed to certain mixing conditions. Clearly, ?n-dependence implies the 
CLT, and so our L p —m- approximability implies central limit theorems for 
practically all important time series models. As we have shown in previous 
papers [5, 8, 35, 36], a scalar version of this notion has much more potential 
than solely giving central limit theorems. 

The concept of weak dependence introduced in Doukhan and Louhichi 
[26] is defined for scalar variables in a very general framework and has been 
successfully used to prove (empirical) FCLTs. Like our approach, it does not 
require smoothness conditions. Its extensions to problems of functional data 
analysis have not been studied yet. 

Another approach to weak dependence is a martingale approximation, 
as developed in Gordin [31] and Philipp and Stout [47]. In the context of 
sequences {X^} of the form (2.1), particularly complete results have been 
proved by Wu [62, 63]. Again, L p -m-approximability cannot be directly 
compared to approximating martingale conditions; the latter hold for a very 
large class of processes, but, unlike LP—m- approximability, they apply only 
in the context of partial sums. 

The classical approach to weak dependence, developed in the seminal pa- 
pers of Rosenblatt [54] and Ibragimov [37], uses the strong mixing property 
and its variants like /3, (ft, p and ift mixing. The general idea is to measure 
the maximal dependence between two events lying in the "past" and 
in the "future" T^ m , respectively. The fading memory is described by this 
maximal dependence decaying to zero for m growing to oo. For example, 
the a-mixing coefficient is given by 

a m = sup{|P(^ n B) - P(A)P(B)\A e^,Be F+ +m ,kG Z}. 

A sequence is called a-mixing (strong mixing) if a m —> for m — > oo . 

This method yields very sharp results (for a complete account of the 
classical theory (see Bradley [16]), but verifying mixing conditions of the 
above type is not easy, whereas the verification of L p -m-approximability is 
almost immediate as our examples below show. This is because the LP— 
m-approximability condition uses directly the model specification X n = 
f(e n ,e n -i,...). Another problem is that even when mixing applies (e.g., 
for Markov processes), it typically requires strong smoothness conditions. 
For example, for the AR(1) process 

with Bernoulli innovations, strong mixing fails to hold (cf. Andrews [2]). 
Since c-mixing, where c is either of ift, (ft, (3 or p, implies a-mixing, {Y^} above 
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satisfies none of these mixing conditions, whereas Example 2.1 shows that 
the AR(1) process is L p -m-approximable without requiring any smoothness 
properties for the innovations process. Consequently our condition does not 
imply strong mixing. On the other hand, L p -m-approximability is restricted 
to a more limited class of processes, namely processes allowing the repre- 
sentation X n = f(e n ,e n -i, . . .). We emphasize, however, that all time series 
models used in practice (scalar, vector or functional) have this representation 
(cf. [49, 59, 60]), as an immediate consequence of their "forward" dynamics, 
for example, their definitions by a stochastic recurrence equations. See the 
papers of Rosenblatt [55-57] for sufficient criteria. 

We conclude that LP ' -m-approximability is not directly comparable with 
classical mixing coefficients. 

The following lemma shows how L p -m-approximability is unaffected by 
linear transformations, whereas independence assumptions are needed for 
product type operations. 

Lemma 2.1. Let {X n } and {Y n } be two LP -ui-approximability sequences 
in L P H . Define: 

• Z { n ] = A(X n ), where A G £; 

• Z n ^ = X n -\-Y n ; 

(3) 

• Zn = X n o Y n (point-wise multiplication); 

• Z n ^ = (X n , Y n ) ; 

• Z n 5) =X n ®Y n . 

Then {Z^} and {Z n 2) } are LP -m- approximate sequences in LP H . Lf X n 
and Y n are independent, then {Zn} and {Zn } are L p -m- approximate se- 
quences in the respective spaces. 7/E , sup tg [ ,i] l^n(*)l P + ^ su Pte[o,i] < 
oo, then {Zn } is LP '-m- approximate in L P H . 

Proof. The first two relations are immediate. We exemplify the rest of 
the simple proofs for Z n = Z n ^ . For this we set Z^ = xffl ® Y^ 1 ^ and note 
that Z m and Z^ are (random) kernel operators, and thus Hilbert-Schmidt 
operators. Since 



\z — z( m )|| < II 7, — z( m )|| 

\ Zj ra £> m \\C — W^m \\S 



< 



J J (X m (t)Y m ( S )-XtHt)Yi m Hs)) 2 dtds 

< v / 2(||x m ||||y rn - yM|| + liy^llll^ - x£> 

the proof follows from the independence of X n and Y n . □ 
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The proof shows that our assumption can be modified and indepen- 
dence is not required. However, if X,Y are not independent, then _E|XY| 7^ 
-E|X|i?|Y|. We have then to use the Cauchy-Schwarz inequality and obvi- 
ously need 2p moments. 

We want to point out that only a straightforward modification is neces- 
sary in order to generalize the theory of this paper to noncausal processes 
X n = /(..., e n+ i,e n ,e n -i, . . .). Our framework can be also extended to non- 
stationary sequences, for example, those of the form (2.1) where {£k} is 
a sequence of independent, but not necessarily identically distributed, or 
random variables where 

X n — fn{^m ^n— 1) • • •)• 

The m-dependent coupled process can be defined in the exact same way as 
in the stationary case 

y(m) _ f 1 An) An) \ 

~ Jn\t-n, • ■ • i tre-m+li t n _ m , t n _ m __x, • • ■)■ 

A generalization of our method to nonstationarity would be useful, especially 
when the goal is to develop methodology for locally stationary data. Such 
work is, however, beyond the intended scope of this paper. 

We now illustrate the applicability of Definition 2.1 with several examples. 
Let C = C(H,H) be the set of bounded linear operators from H to H. For 
A G C we define the operator norm \\A\\c = supi^n^ ||Ac||. If the operator 
is Hilbert-Schmidt, then we denote with \\A\\$ its Hilbert-Schmidt norm. 
Recall that for any Hilbert-Schmidt operator A£ C, ||>l||£<||^4||s. 

Example 2.1 (Functional autoregressive process). Suppose ^ S C sat- 
isfies H^ll/: < 1. Let e n G L 2 H be i.i.d. with mean zero. Then there is a unique 
stationary sequence of random elements X n £ L 2 H such that 

(2.7) X n (t) = V(X n - 1 )(t) + e n (t). 

For details see Chapter 3 of Bosq [14]. The AR(1) sequence (2.7) admits 
the expansion X n = Yl'jLo^"' '( £ n-j) where W is the jth iterate of the op- 
erator *. We thus set xt ] = Ef=o + It is easy 

to verify that for every A in C, u p (A(Y)) < \\A\\cu p (Y). Since X m — xjn = 
E^J^>m-i) - ^(e^)), it follows that u p (X m - xt ] ) < 
2 i2T=m\\^\\cM £ o) = 0(1) x z/ p (e )||*||^. By assumption u 2 (s ) < 00 and 
therefore Em=i ^{Xm — xffl) < 00, so condition (2.6) holds with p > 2, as 
long as fp(eo) < 00. 
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The argument in the above example shows that a sufficient condition to 
obtain L p -m-approximability is 

\\f(a m , . . .,ai,x ,x-i,.. .) - f(a m ,.. .,ai,y ,y-i,.. .)|| 

< c m ||/(x , . . .) - f(yo, y-i, • • 

where Ylm>\ °m < oo. This holds for a functional AR(1) process and offers 
an attractive sufficient and distribution-free condition for L p -m-approxim- 
ability. The interesting question, whether one can impose some other, more 
general conditions on the function / that would imply L p -m- approximability 
remains open. For example, the simple criterion above does not apply to 
general linear processes. We recall that a sequence {X n } is said to be a linear 
process in H if X n = ^'jLq ^ 'j(e n -j) where the errors e n S L 2 H are i.i.d. and 
zero mean, and each ^> j is a bounded operator. If Y^jLx ll^jlli < 00 > then 
the series defining X n converges a.s. and in L 2 H (see Section 7.1 of Bosq 
[14])- 

A direct verification, following the lines of Example 2.1, yields sufficient 
conditions for a general linear process to be L p -m-approximable. 

Proposition 2.1. Suppose {AT n } E L 2 H is a linear process whose errors 
satisfy v p (eq) <oo, p>2. The operator coefficients satisfy 

(2.8) EEll*ill<°°- 

m=l j=m 

Then {X n } is L p -m-approximable. 

We note that condition (2.8) is comparable to the usual assumptions made 
in the scalar case. For a scalar linear process the weakest possible condition 
for weak dependence is 

00 

(2.9) ^jl < 00. 

j=0 

If it is violated, the resulting time series are referred to as strongly depen- 
dent, long memory, long-range dependent or persistent. Recall that (2.9) 
merely ensures the existence of fundamental population objects like an ab- 
solutely summable autocovariance sequence or a bounded spectral density. It 
is, however, too weak to establish any statistical results. For example, for the 
asymptotic normality of the sample autocorrelations we need ^jip 2 < 00, 
for the convergence of the periodogram ordinates ^ VJl^Pjl < 00 • Many au- 
thors assume X^'IV'jl < 00 to be able to use all these basic results. The 
condition ^il^jl < co is equivalent to (2.8). 
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We next give a simple example of a nonlinear L p -m-approximable se- 
quence. It is based on the model used by Maslova et al. [44] to simulate 
the so-called solar quiet (Sq) variation in magnetometer records (see also 
Maslova et al. [43]). In that model, X n (t) = U n (S(t) + Z n (t)) represents the 
part of the magnetometer record on day n which reflects the magnetic field 
generated by ionospheric winds of charged particles driven by solar heating. 
These winds flow in two elliptic cells, one on each day-side of the equator. 
Their position changes from day to day, causing a different appearance of 
the curves, X n (t), with changes in the amplitude being most pronounced. To 
simulate this behavior, S(t) is introduced as the typical pattern for a specific 
magnetic observatory, Z n (t), as the change in shape on day n and the scalar 
random variable U n as the amplitude on day n. With this motivation, we 
formulate the following example. 

Example 2.2 (Product model). Suppose {Y n } E L p h and {U n } E L p are 
both L p —m- approximable sequences, independent of each other. The respec- 
tive representations are Y n = 5(771,772, . . .) and U n = ^(71,72, • ■ •)• Each of 
these sequences could be a linear sequence satisfying the assumptions of 
Proposition 2.1, but they need not be. The sequence X n (t) = U n Y n (t) is then 
a nonlinear L p -m-approximable sequence with the underlying i.i.d. variables 
£n = (7?n)7n)- This follows by after a slight modification from Lemma 2.1. 

Example 2.2 illustrates the principle that in order for products of LP- 
m-approximable sequences to be L p -m-approximable, independence must 
be assumed. It does not have to be assumed as directly as in Example 2.2; 
the important point being that appropriately-defined functional Volterra 
expansions should not contain diagonal terms so that moments do not pile 
up. Such expansions exist (see, e.g., Giraitis, Kokoszka and Leipus [28], 
for all nonlinear scalar processes used to model financial data). The model 
X n (t) =Y n (t)U n is similar to the popular scalar stochastic volatility model 
7*n = v n e n used to model returns r n on a speculative asset. The dependent 
sequence {v n } models volatility, and the i.i.d. errors e n , independent of the 
v n , generate unpredictability in returns. 

Our next examples focus on functional extensions of popular nonlinear 
models, namely the bilinear model of [32] and the celebrated ARCH model 
of Engle [27]. Both models will be treated in more detail in forthcoming 
papers. Proofs of Propositions 2.2 and 2.3 are available upon request. 

Example 2.3 (Functional bilinear process). Let (e n ) be an H- valued 
i.i.d. sequence and let tp E H ® H and <j> E H (g) H ® H . Then the process 
defined as the recurrence equation, 



X n+ i(t) = I ip(t,s)X n (s)ds+ / / (j)(t,s,u)X n (s)e n (u)dsdu + e n+1 (t), 
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is called functional bilinear process. 

A neater notation can be achieved by denning ip:H —> H, the kernel 
operator with the kernel function (f)(t, s), and (j) n :H^H, the random kernel 
operator with kernel 



4> n (t,s)= I (f>(t,s,u)e n (u)du. 
In this notation, we have 

(2.10) X n+1 = {^ + (t) n ){X n ) + e n+1 

with the usual convention that {A + B){x) = A(x) + B(x) for operators A,B. 
The product of two operators AB{x) is interpreted as successive application 
A(B(x)). 

A formal solution to (2.10) is 

oo k—l 

(2.11) X n+1 = Y^ + 4>n-j){e n+ i-k) 

k=0 j=0 

and the approximating sequence is defined by 

m k—l 

X^=Y,H^ + 4>n-j)(e n+ i-k). 

k=0j=0 

The following proposition establishes sufficient conditions for the L p -m- 
approximability. 

Proposition 2.2. Let {X n } be the functional bilinear process defined in 

(2.10) . If E log \ \eq\\ < oo and E\og \\tjj + 0o|| < 0, then a unique strictly sta- 
tionary solution for this equation exists. The solution has (I? -)representation 

(2.11) . If Vp((ip + </>o)(£o)) < oo an d E\\ip + 4>o\\s < 1, the process is LP-m- 
approximable. 



Example 2.4 (Functional ARCH). Let 5 G H be a positive function and 
let {£k} an i-i-d. sequence in L 4 ^. Further, let f3(s,t) be a nonnegative kernel 
function in L 2 ([0,1] 2 ,Bf 01] , X 2 ). Then we call the process 

(2.12) y k (t) = e k (t)a k (t), te[0,l], 

where 



(2.13) a 2 (t) = 5(t)+ f /3(t, 

Jo 



s)yl_ 1 (s)ds, 



the functional ARCH(l) process. 

Proposition 2.3 establishes conditions for the existence of a strictly sta- 
tionary solution to (2.12) and (2.13) and its L p -m-approximability. 
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Proposition 2.3. Define K(e\) = (JJ f3 2 (t,s)ef(s)dsdt) 1 / 2 . If there is 
some p > such that E{K(e 2 )} p < 1 then (2.12) and (2.13) have a unique 
strictly stationary and causal solution and the sequence {yk} is 
LP -m-approximable. 

3. Convergence of eigenvalues and eigenfunctions. Denote by C = E[(X, 
■}X] the covariance operator of some X G L 2 H . The eigenvalues and eigen- 
functions of C are a fundamental ingredient for principal component analysis 
which is a key technique in functional data analysis. In practice, C and its 
eigenvalues/eigenfunctions are unknown and must be estimated. The pur- 
pose of this section is to prove consistency of the corresponding estimates 
for L 4 -m-approximable sequences. The results derived below will be applied 
in the following sections. We start with some preliminary results. 

Consider two compact operators C,K G C with singular value decompo- 
sitions 

oo oo 

(3.1) C(x) = ^2\ j (x,v j )f j , K(x) =^2jj{x,u j )g j . 

3=1 3=1 

The following lemma is proven in Section VI. 1 of (see Gohberg, Golberg and 
Kaashoek [30], Corollary 1.6, page 99). 

Lemma 3.1. Suppose C,K G C are two compact operators with singular 
value decompositions (3.1). Then, for each j > 1, \jj — \A < \\K — C\\c- 

We now tighten the conditions on the operator C by assuming that it is 
Hilbert-Schmidt, symmetric and positive definite. These conditions imply 
that fj = Vj in (3.1), C(vj) = XjVj and < 00 • Consequently Xj are 

eigenvalues of C and Vj the corresponding eigenfunctions. We also define 

v'j = cjVj, cj = siga((uj,Vj)). 

Using Lemma 3.1, the next lemma can be established by following the lines 
of the proof of Lemma 4.3 of Bosq [14]. 

Lemma 3.2. Suppose C,K G C are two compact operators with singular 
value decompositions (3.1). If C is Hilbert-Schmidt, symmetric and positive 
definite, and its eigenvalues satisfy 

(3.2) Ai>A 2 >--->A d >A d +i, 
then 

K-t4H<— Htf-CILc, l<j<d, 

Ctj 

where a\ = Ai — A2 and ctj = min(Aj_i — Xj, Xj — Aj+i), 2 < j < d. 
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Let {J n } G L 2 H be a stationary sequence with covariance operator C. In 
principle we could now develop a general theory for H valued sequences, 
where H is an arbitrary separable Hilbert space. In practice, however, the 
case H = L 2 ([0, 1],%!], A) is most important. In order to be able to fully use 
the structure of H and and not to deal with technical assumptions, we need 
the two basic regularity conditions below, which will be assumed throughout 
the paper without further notice. 

Assumption 3.1. (i) Each X n is measurable (#[0,1] x A) /Br. 
(ii) sup tem E\X(t)\ 2 <oo. 

Assumption 3.1(i) is necessary in order that the sample paths of X n are 
measurable. Together with (ii) it also implies that C is an integral operator 
with kernel c(t,s) = Cav(Xi(t),Xi(s)) whose estimator is 

N 

(3.3) c(t, s) = N- 1 J2(X n (t) - X N (t))(X n (s) - X N (s)). 

n=l 

Then natural estimators of the eigenvalues Xj and eigenfunctions Vj of C 
are the eigenvalues Xj and eigenfunctions Vj of C, the operator with the 
kernel (3.3). By Lemmas 3.1 and 3.2 we can bound the estimation errors for 
eigenvalues and eigenfunctions by ||C — C\\g. Mas and Mennetau [42] show 
that transferring asymptotic results from the operators to the eigenelements 
holds quite generally, including a.s. convergence, weak convergence or large 
deviation principles. This motivates the next result. 

Theorem 3.1. Suppose {X n } G is an L 4 -m-approximable sequence 
with covariance operator C. Then there is some constant Ux < 00, which 
does not depend on N , such that 

(3.4) NE\\C-C\\%<U X . 
If the X n have zero mean, then we can choose 



(3.5) U x = v\(X) + AV2ut{X) J2 MX r - X, 



(r), 
r 1 



r=l 



The proof of Theorem 3.1 is given in Section A.l. Let us note that by 
Lemma 3.1 and Theorem 3.1, 

NE[\Xj - Xj\ 2 } < NE\\C - C\\ 2 C < NE\\C - C\\% < U x . 

Assuming (3.2), by Lemma 3.2 and Theorem 3.1, [cj = sign(( , 0j, Vj})], 



NE[\\ Cj v 3 - Vj \\ 2 ] < (^j 2 NE\\C - C\\l < ^NE\\C - C\\% < ^ 
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with the ay defined in Lemma 3.2. 

These inequalities establish the following result. 

Theorem 3.2. Suppose {A n } E Ljj is an L 4 -m-approximable sequence 
and assumption (3.2) holds. Then, for l<j<d, 

(3.6) limsup NE[\Xj — Xj\ 2 ] < oo, limsup NE[\\djVj — Vj\\ 2 ] < oo. 

Relations (3.6) have become a fundamental tool for establishing asymp- 
totic properties of procedures for functional simple random samples which 
are based on the functional principal components. Theorem 3.2 shows that 
in many cases one can expect that these properties will remain the same un- 
der weak dependence; an important example is discussed in Section 6. The 
empirical covariance kernel (3.3) is, however, clearly designed for simple 
random samples, and may not be optimal for representing dependent data 
in the most "useful" way. The term "useful" depends on the application. 
Kargin and Onatski [38] show that a basis different than the eigenfunctions 
Vk is optimal for prediction with a functional AR(1) model. An interesting 
open problem is how to construct a basis optimal in some general sense 
for dependent data. In Section 4 we focus on a related, but different, prob- 
lem of constructing a matrix which "soaks up" the dependence in a manner 
that allows the extension of many multivariate time series procedures to a 
functional setting. The construction of this matrix involves arbitrary basis 
vectors vt estimated by Vk in such a way that (3.6) holds. 

4. Estimation of the long-run variance. The main results of this section 
are Corollary 4.1 and Proposition 4.1 which state that the long-run variance 
matrix obtained by projecting the data on the functional principal compo- 
nents can be consistently estimated. The concept of the long-run variance, 
while fundamental in time series analysis, has not been studied for func- 
tional data, and not even for scalar approximable sequences. It is therefore 
necessary to start with some preliminaries which lead to our main results 
and illustrate the role of the L p -m-approximability. 

Let {AT n } be a scalar (weakly) stationary sequence. Its long-run variance 
is defined as a 2 = Yljezlji where jj = Cov(Xq, Xj), provided this series 
is absolutely convergent. Our first lemma shows that this is the case for 
L 2 -m- approximable sequences. 

Lemma 4.1. Suppose {X n } is a scalar L 2 -m-approximable sequence. 
Then its autocovariance function jj = Cov(Xq, Xj) is absolutely summable, 
that is, Efc-oj7jl <oo. 
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Proof. Observe that for j > 0, 

Cav(X ,Xj) = Cov(X ,*j - X®) + Cov(X ,xj j) ). 

Since 

Ao = /(£0) £ -ij • • •)> = f^H e ji £ j-ii ■ ■ ■ i £ i-> £ o ' ) £ -i) • • -)i 

the random variables A~o and are independent, so Cov(A~o, A^ ) = 0, 
and 

\^<[EXl]^[E{ Xj -Xfff\ □ 

The summability of the autocovariances is the fundamental property of 
weak dependence because then iV Var[AV] — > ^2'jL- 00 7j'i that is, the vari- 
ance of the sample mean converges to zero at the rate A -1 , the same as 
for i.i.d. observations. A popular approach to the estimation of the long-run 
variance is to use the kernel estimator 

1 N-\j\ 

v 2 = ^i^j, lj = ]y Y ( Xi ~ x *r)( x i+\j\ ~ Xn ^- 
\j\<g »=i 

Various weights co q (j) have been proposed and their optimality properties 
studied (see Anderson [1] and Andrews [3], among others). In theoretical 
work, it is typically assumed that the bandwith q is a deterministic function 
of the sample size such that q = q(N) — > oo and q = o(N r ), for some < r < 
1. We will use the following assumption: 

Assumption 4.1. The bandwidth q = q(N) satisfies q— > oo, q 2 /N — > 0, 
and the weights satisfy u q (j) = oj q (—j) and 

(4-1) MJ)\<b 

and, for every fixed j, 

(4.2) u q (j) -> 1. 

All kernels used in practice have symmetric weights and satisfy conditions 
(4.1) and (4.2). 

The absolute summability of the autocovariances is not enough to estab- 
lish the consistency of the kernel estimator a 2 . Traditionally, summability 
of the cumulants has been assumed to control the fourth order structure 
of the data. Denoting fj, = EXq, the fourth order cumulant of a stationary 
sequence is defined by 



K(h,r,s) = Cov((A - fi)(X h - //), (A r - fi)(X 3 - //)) - 7 r 7h.- s - Islh-r- 
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The ususal sufficient condition for the consistency of a is 

oo oo oo 

(4.3) E E E Hh,r,s)\<oo. 

h=—oo r=—oos=—oo 

Recently, Giraitis et al. [29] showed that condition (4.3) can be replaced by 
a weaker condition, 

oo oo 

(4.4) sup E \K(h,r,s)\ <oo. 

r=— oas=— oo 

A technical condition we need is 

q(N) AT-1 

AT 1 E El Cov ( X o(^ -Xi%X^X^)\ -+0. 

k,l=0 r=l 

By analogy to condition (4.4), it can be replaced by a much stronger, but a 
more transparent condition, 



oo 



(4.6) sup Y,\Cov(X (X k - xi k) ),X^X^+ e) )\ < oo. 



k,l>0 r=1 



To explain the intuition behind conditions (4.5) and (4.6), consider the 
linear process X k = Y^=o c jXk-j- For k > 0, 

oo oo 

x k - xf ] = c-jEk-j - E c A k ~j- 

j=k j=k 

Thus Xo(X k — -Xl ) depends on 

(4.7) eo,£-i,S-2,..- and 4 , e -i> e -2> • • • 

and Xf* X^p depends on 

_ Jr)Jr) Jr) , Jr+t) (r+t) Jr+t) 

^r+£i ■ ■ ■ > fc l) fc o — l' fc — 2' ' ' ' "• llu to —1 ' — 2 )•••■ 

Consequently, the covariances in (4.6) vanish except when r = k or r + £ = k, 
so condition (4.6) always holds for linear processes. 
For general nonlinear sequences, the difference 

Xk - X^ = f(ek, ■ ■ ■ ,£i,£o,£-i, • • •) — /(£fe, • • • ,£i,£ fc ) £ -ii • • ■)> 

cannot be expressed only in terms of the errors (4.7), but the errors £&,..., ei 

(k) 

should approximately cancel, so that the difference Xk — Xj. is small and 

very weakly correlated with Xr X^. +e . 

With this background, we now formulate the following result. 
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Theorem 4.1. Suppose {X n } £ L 4 is an L A -m-approximable and as- 
sume condition (4-5) holds. If Assumption 4-1 holds, then a 2 — > X^-ooTr 

Theorem 4.1 is proven in Section A.l. The general plan of the proof is 
the same as that of the proof of Theorem 3.1 of Giraitis et al. [29], but 
the verification of the crucial relation (A. 5) uses a new approach based on 
L 4 -m-approximability. The arguments preceding (A. 5) show that replacing 
Ajv by fj, = EXq does not change the limit. We note that the condition 
q 2 /N — > we assume is stronger than the condition q/N — > assumed by 
Giraitis et al. [29]. This difference is of little practical consequence, as the 
optimal bandwidths for the kernels used in practice are typically of the order 
0(A^ 1 / 5 ). Finally, we notice that by further strengthening conditions on the 
behavior of the bandwidth function q = q(N), the convergence in probability 
in Theorem 4.1 could be replaced by the almost sure convergence, but we 
do not pursue this research here. The corresponding result under condition 
(4.4) was established by Berkes et al. [9]; it is also stated without proof as 
part of Theorem A.l of Berkes et al. [10]. 

We now turn to the vector case in which the data are of the form 

X n = [Xi n , X2n, ■ ■ ■ , Xdn] T , Tl = 1, 2, . . . , N. 

Just as in the scalar case, the estimation of the mean by the sample mean 
does not affect the limit of the kernel long-run variance estimators, so we 
assume that EXi n = and define the autocovariances as 

7 r (i, j) = E[X i0 X jr ], l<i,j<d. 

If r > 0, J r (i,j) is estimated by iV -1 X^=Ti Xi n Xj >n+r , but if r < 0, it is esti- 
mated by N^ 1 J2n=i V ^ Xi. n+ \ r \Xj in . We therefore define the autocovariance 
matrices 

N—r 

N" 1 XnX£ +r , if r > 0, 

1 r ~ \ N-\r\ 

N- 1 Xn+M^n , if r < 0. 

„ 71=1 

The variance Var[A r ~ 1 X n ] has (i,j)-entry 

iV" 2 £ E[X im X jn ]=N- 1 br(*,j), 

m,n=l \r\<N 

so the long-run variance is 

oo 

£= J2 T ^ T r :=[ lr (i,j),l<i,j<d], 

r=— oo 
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and its kernel estimator is 

(4.8) ±=J2^(r)r r . 

\r\<q 

The consistency of S can be established by following the lines of the proof 
of Theorem 4.1 for every fixed entry of the matrix S. Conditition (4.5) must 
be replaced by 

q(N) N-l 

(4.9) AT 1 £ £ max \Cov(X i0 (X jk - X^),X^X^})\ -+ 0. 

k,l=0 r=l - h3 ~ 

Condition (4.9) is analogous to cumulant conditions for vector processes 
which require summability of fourth order cross-cumulants of all scalar com- 
ponents (see, e.g., Andrews [3], Assumption A, page 823). 
For ease of reference we state these results as a theorem. 

Theorem 4.2. (a) If {X n } E L^ d is an L 2 -m- approximate sequence, 
then the series X^-oo^r converges absolutely, (b) Suppose {X n } G L^ d an 
L 4 -m- approximate sequence such that condition (4-9) holds. If Assumption 
4-1 holds, then S — > 

We are now able to turn to functional data. Suppose {X n } G L 2 H is a 
zero mean sequence, and v±, V2, ■ ■ ■ , Vd is any set of orthonormal functions 
in H. Define X in = J X n (t)vi(t) dt, X n = [X ln , X 2n , ■ ■ ■ , X dn ] T and T r = 
Cov(Xo, X r ). A direct verification shows that if {A n } is L p -m-approximable, 
then so is the vector sequence {X n }. We thus obtain the following corollary. 



Corollary 4.1. (a) If {X n } G L 2 H is an L 2 -m- approximate sequence, 

then the series X^-oo-^r converges absolutely, (b) //, in addition, {X n } 

is L^-m- approximate and Assumption 4-1 and condition (4-9) hold, then 
~ p 

In Corollary 4.1, the functions v\, i>2, . . . , Vd form an arbitrary orthonormal 
deterministic basis. In many applications, a random basis consisting of the 
estimated principal components #i,U2, . . . ,Vd is used. The scores with respect 
to this basis are defined by 

% = J (Xi(t) - X N {t))veXt) dt, l<£<d. 

To use the results established so far, it is convenient to decompose the sta- 
tionary sequence {X n } into its mean and a zero mean process; that is, we 
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set X n (t) = fj,(t) + Y n (t), where EY n (t) = 0. We introduce the unobservable 
quantities 

(4.10) fo n = j Y n (t)v e (t) dt, kn = j Y n (t)v e (t) dt, l<£<d. 

We then have the following proposition which will be useful in most statis- 
tical procedures for functional time series. An application to change point 
detection is developed in Section 5. 

Proposition 4.1. Let C = diag(ci,...,c d ), with c\ =sign((u i ,f) i )). Sup- 
pose {X n } 6 L^j is L 4 ^m-approximable and that (3.2) holds. Assume further 
that Assumption 4-1 holds with a stronger condition q 4 /N — > 0. Then 

\£({3)-£(Cp)\=o P (l) and |E(r/) - = o P (l). 

The proof of Proposition 4.1 is delicate and is presented in Section A.l. 
We note that condition (4.9) does not appear in the statement of Proposition 
4.1. Its point is that if E(/3) is consistent under some conditions, then so is 
Eft). 

5. Change point detection. Functional time series are obtained from 
data collected sequentially over time, and it is natural to expect that con- 
ditions under which observations are made may change. If this is the case, 
procedures developed for stationary series will produce spurious results. In 
this section, we develop a procedure for the detection of a change in the mean 
function of a functional time series, the most important possible change. In 
addition to its practical relevance, the requisite theory illustrates the appli- 
cation of the results developed in Sections 3 and 4. The main results of this 
Section, Theorems 5.1 and 5.2, are proven in Section A. 2. 

We thus consider testing the null hypothesis, 

H :EX 1 (t) = EX 2 (t) = --- = EX N (t), t€ [0,1]. 

Note that under Hq, we do not specify the value of the common mean. 

Under the alternative, Hq does not hold. The test we construct has a 
particularly good power against the alternative in which the data can be 
divided into several consecutive segments, and the mean is constant within 
each segment but changes from segment to segment. The simplest case of 
only two segments (one change point) is specified in Assumption 5.2. First 
we note that under the null hypothesis, we can represent each functional 
observation as 

(5.1) Xi(t)=n(t)+Yi(t), EYi(t)=0. 

The following assumption specifies conditions on //(•) and the errors Yi(-) 
needed to establish the convergence of the test statistic under Hq. 
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Assumption 5.1. The mean \i in (5.1) is in H. The error functions 
Y% G Ljj are L 4 -m-approximable mean zero random elements such that the 
eigenvalues of their covariance operator satisfy (3.2). 

Recall that the L -m-approximability implies that the Y{ are identically 
distributed with viSXi) < °°- I n particular, their covariance function, 

c(t,s) = E[Y i (t)Y i (s)], 0<t,s<l, 

is square integrable, that is, is in L 2 ([0, 1] x [0, 1]). 

We develop the theory under the alternative of exactly one change point, 
but the procedure is applicable to multiple change points by using a seg- 
mentation algorithm described in Berkes et al. [7] and dating back at least 
to Vostrikova [61]. 

Assumption 5.2. The observations follow the model 

ilj ~wt)+im k*<i<N, 

in which the Y{ satisfy Assumption 5.1, the mean functions fi\ and \i<i are 
in L 2 and 

k* = [nO] for some < 9 < 1. 

The general idea of testing is similar to that developed in Berkes et al. 
[7] for independent observations, the central difficulty is in accommodating 
the dependence. To define the test statistic, recall that bold symbols denote 
(i-dimensional vectors, for example, i] i = [fjii,fj2i,---,fjdi] T - To lighten the 

notation, define the partial sums process, Sn(x,£) = Yllv^i €m x ^ [0, 1] , 
and the process, Ijn(x,£) = Sat(x,^) — xSjv(l,£), where is a generic 

-Revalued sequence. Denote by the long-run variance of the sequence 
{£n}> an d by its kernel estimator (see Section 4). The proposed test 

statistic is then 

1 r 1 

(5.2) T N (d) = - L N (x,f l ) T V{fi)- 1 -L N (x,f ) )dx. 

iV Jo 

Our first theorem establishes its asymptotic null distribution. 

Theorem 5.1. Suppose Hq and Assumption 5.1 hold. If the estimator 
S(r)) is consistent, then 

, d rl 

(5.3) T N {d) AT(d) := V / Bj(x)dx, 

where {B^{x),x £ [0, 1]}, 1 <l <d are independent Brownian bridges. 
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The distribution of the random variable T(d) was derived by Kiefer [39]. 
The limit distribution is the same as in the case of independent observations; 
this is possible because the long-run variance estimator 53(r)) soaks up the 
dependence. Sufficient conditions for its consistency are stated in Section 4, 
and, in addition to the assumptions of Theorem 5.1, they are: Assumption 
4.1 with q^/N ^ 0, and condition (4.9). 

The next result shows that our test has asymptotic power 1. Our proof 
requires the following condition: 

(5.4) £(r)) =F O where ft is some positive definite matrix. 

Condition (5.4) could be replaced by weaker technical conditions, but 
we prefer it, as it leads to a transparent, short proof. Essentially, it states 
that the matrix S(f)) does not become degenerate in the limit, and the 
matrix ft has only positive eigenvalues. A condition like (5.4) is not needed 
for independent Y{ because that case does not require normalization with 
the long-run covariance matrix. To formulate our result, introduce vectors 
^1)^2 £ ^ with coordinates 

J [n(t)v e (t) dt and J fj, 2 (t)v e (t) dt, l<£<d. 



Theorem 5.2. Suppose Assumption 5.2 and condition (5.4) hold. If the 
vectors /i 1 and /x 2 are not equal, then T^{d) — > oo. 

We conclude this section with two numerical examples which illustrate 
the effect of dependence on our change point detection procedure. Example 
5.1 uses synthetic data while Example 5.2 focuses on particulate pollution 
data. Both show that using statistic (5.2) with S(t)) being the estimate for 
just the covariance, not the long-run covariance matrix, leads to spurious 
rejections of Hq, a nonexistent change point can be detected with a large 
probability. 

Example 5.1. We simulate 200 observations of the functional AR(1) 
process of Example 2.1, when has the parabolic integral kernel ip(t,s) = 
7- (2- (2x - l) 2 - (2y - l) 2 ). We chose the constant 7 such that \\V\\ S = 0.6 
(the Hilbert-Schmidt norm). The innovations {e n } are standard Brownian 
bridges. The first 3 principal components explain approximately 85% of the 
total variance, so we compute the test statistic T2oo(3) given in (5.2). For the 
estimation of the long-run covariance matrix S we use the Bartlett kernel 

w (i) m= /l-ljl/U + 9), if \j\<q\ 
q v ' \ 0, otherwise. 
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We first let q = which corresponds to using just the sample covariance 
of {rj n } in the normalization for the test statistic (5.2) (dependence is ig- 
nored). We use 1000 replications and the 5% confidence level. The rejec- 
tion rate is 23.9%, much higher than the nominal level of 5%. In con- 
trast, using an appropriate estimate for the long-run variance, the relia- 
bility of the test improves dramatically. Choosing an optimal bandwidth 
q is a separate problem which we do not pursue here. Here we adapt the 
formula q « 1.1447(aiV) 1 / 3 , a = n^jpjz valid for a a scalar AR(1) process 
with the autoregressive coefficient ip (Andrews [3]). Using this formula with 
^ = II ^ lis = 0-6 results in q = 4. This choice gives the empirical rejection 
rate of 3.7%, much closer to the nominal rate of 5%. 

Example 5.2. This example, which uses pmlO (particulate matter with 
diameter < 10 /.mi, measured in /ig/m 3 ) data, illustrates a similar phe- 
nomenon as Example 5.1. For the analysis we use pmlO concentration data 
measured in the Austrian city of Graz during the winter of 2008/2009 
(N=151). The data are given in 30 minutes resolution, yielding an intra- 
day frequency of 48 observations. As in Stadtlober, Hormann and Pfeiler 
[58] we use a square root transformation to reduce heavy tails. Next we 
remove possible weekly periodicity by subtracting the corresponding mean 
vectors obtained from the different weekdays. A time series plot of this new 
sequence is given in Figure 2. The data look relatively stable, although a 
shift appears to be possible in the center of the time series. It should be 
emphasized, however, that pmlO data, like many geophysical time series, ex- 
hibit a strong, persistent, positive autocorrelation structure. These series are 




50 100 150 

day 



Fig. 2. Seasonally detrended ^/pmlO, Nov 1, 2008-Mar 31, 2009. 
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Fig. 3. Left panel: sample autocorrelation function of the first empirical PC scores. Right 
panel: sample partial autocorrelation function of the first empirical PC scores. 



stationary over long periods of time with an appearance of local trends or 
shifts at various time scales (random self-similar or fractal structure). 

The daily measurement vectors are transformed into smooth functional 
data using 15 B-splines functions of order 4. The functional principal compo- 
nent analysis yields that the first three principal components explain ~ 84% 
of the total variability, so we use statistic (5.2) with d = 3. A look at the 
acf and pacf of the first empirical PC scores (Figure 3) suggests an AR(1), 
maybe AR(3) behavior. The second and third empirical PC scores show no 
significant autocorrelation structure. We use the formula given in Example 
5.1 with ip = 0.70 (acf at lag 1) and N = 151 and obtain q ~ 4. This gives 
2i5i(3) = 0.94 which is close to the critical value 1.00 when testing at a 
95% confidence level but does not support rejection of the no-change hy- 
pothesis. In contrast, using only the sample covariance matrix in (5.3) gives 
^151 (3) = 1-89 and thus a clear and possibly wrongful rejection of the null 
hypothesis. 

6. Functional linear model with dependent regressors. The functional 
linear model is one of the most widely used tools of FDA. Its various forms 
are introduced in Chapters 12-17 of Ramsay and Silverman [50]. To name a 
few recent references we mention Cuevas, Febrero and Fraiman [22], Malfait 
and Ramsay [41], Cardot et al. [18], Cardot, Ferraty and Sarda [19], Chiou, 
Miiller and Wang [21], Miiller and Stadtmiiller [46], Yao, Miiller and Wang 
[65], Cai and Hall [17], Chiou and Miiller [20], Li and Hsing [40], Reiss and 
Ogden [51], Reiss and Ogden [52, 53]. 
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We focus on the fully functional model of the form 

(6.1) Y n (t) = j \p(t,s)X n (s)+e n (t), n=l,2...,N, 

in which both the regressors and the responses are functions. The results of 
this section can be easily specialized to the case of scalar responses. 

In (6.1), the regressors are random functions, assumed to be independent 
and identically distributed. As explained in Section 1, for functional time 
series the assumption of the independence of the X n is often questionable, 
so it is important to investigate if procedures developed and theoretically 
justified for independent regressors can still be used if the regressors are 
dependent. 

We focus here on the estimation of the kernel ip(t,s). Our result is moti- 
vated by the work of Yao, Miiller and Wang [65] who considered functional 
regressors and responses obtained from sparce independent data measured 
with error. The data that motivates our work are measurements of physi- 
cal quantities obtained with negligible errors or financial transaction data 
obtained without error. In both cases the data are available at fine time 
grids, and the main concern is the presence of temporal dependence between 
the curves X n . We therefore merely assume that the sequence {X n } £ L A H 
is L 4 ~m-approximable, which, as can be easily seen, implies the L 4 -m- 
approximability of {^«}. To formulate additional technical assumptions, we 
need to introduce some notation. 

We assume that the errors e n are i.i.d. and independent of the X n , and 
denote by X and Y random functions with the same distribution as X n and 
Y n , respectively. We work with their expansions 

oo oo 

X(s) = J2CMs), Y(t) = J2CM(t), 

i=l j=l 

where the Vj are the FPCs of X and the Uj the FPCs of Y, and £j = 
{X,Vi)Xj = (Y,uj). Indicating with the "hat" the corresponding empirical 
quantities, an estimator of i/j(t,s) proposed by Yao, Miiller and Wang [65] 
is 

K L 
k=l i=l 

where &gk is an estimator of E^Qk}- We will work with the simplest estima- 
tor, 

1 N 

(6.2) a tk = jz ^( X h Vi) ( Y h Uk) , 

i=i 
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but any estimator for which Lemma A.l holds can be used without affecting 
the rates. 

Let Xj and 7j be the eigenvalues corresponding to Vj and Uj. Define ctj 
as in Lemma 3.2, and define a'j accordingly with jj instead of Xj. Set 

hi, = min{aj, 1 < j < L}, h' L = min{a^, l<j<L}. 

To establish the consistency of the estimator ipKL(t,s) we assume that 

(6.3) *:=EE^# ll! <oo 

k=ie=i i 
and that the following assumption holds: 

Assumption 6.1. (i) We have Ai > A 2 > • • • and 71 > 72 > • ■ • . 

(ii) We have K = K(N), L = L(N) 00 and Xl mi ^ K K} = o(N^). 

For model (6.1), condition (6.3) is equivalent to the assumption that 
ip(t,s) is a Hilbert-Schmidt kernel, that is, J J ijj 2 (t,s)dtds < 00. It is for- 
mulated in the same way as in Yao, Miiller and Wang [65] because this form 
is convenient in the theoretical arguments. Assumption 6.1 is much shorter 
than the corresponding assumptions of Yao, Miiller and Wang [65] which 
take up over two pages. This is because we do not deal with smoothing 
and so can isolate the impact of the magnitude of the eigenvalues on the 
bandwidths K and L. 

Theorem 6.1. Suppose {X n } £ Ljj is a zero mean L 4 -m-approximable 
sequence independent of the sequence of i.i.d. errors {e n }. If (6.3) and As- 
sumption 6.1 hold, then 

(6.4) [ f[i; KL (t,s)-^(t,s)] 2 dtds^0, (N^oo). 



The proposition of Theorem 6.1 is comparable to the first part of Theo- 
rem 1 in Yao, Miiller and Wang [65]. Both theorems are established under 
(6.3) and finite fourth moment conditions. Otherwise the settings are quite 
different. Yao, Miiller and Wang [65] work under the assumption that the 
subject (Yi,Xi), i = l,2,... are independent and sparsely observed whereas 
the crucial point of our approach is that we allow dependence. Thus Theo- 
rems 1 and 2 in the related paper Yao, Miiller and Wang [66], which serve 
as the basic ingredients for their results, cannot be used here and have to be 
replaced directly with the theory developed in Section 3 of this paper. Fur- 
thermore, our proof goes without complicated assumptions on the resolvents 
of the covariance operator, in particular without the very technical assump- 
tions (B.5) of Yao, Miiller and Wang [65]. In this sense, our short alternative 
proof might be of value even in the case of independent observations. 
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APPENDIX 

We present the proofs of results stated in Sections 3-6. Throughout we 
will agree on the following conventions. All X n 6 L 2 H satisfy Assumption 3.1. 
A generic X, which is assumed to be equal in distribution to X±, will be 

used at some places. Any constants occurring will be denoted by k±,K2, 

The Ki may change their values from proof to proof. 

A.l. Proofs of the results of Sections 3 and 4. 

Proof of Theorem 3.1. We assume for simplicity that EX = and 

set 

N 

c{t, s) = N' 1 Xn(t)X n (s), c(t, s) = E[X(t)X(s)}. 

n=l 

The proof with a general mean function requires some additional but 
similar arguments. The Cauchy-Schwarz inequality shows that c(-,-) and 
c(-, •) are Hilbert-Schmidt kernels, so C — C is a Hilbert-Schmidt operator 
with the kernel c(t,s) — c(t,s). Consequently, 

N 

N- 1 J2(X n (t)X n (s) - E[X n (t)X n (s)}) 
n=l 

For fixed s and t, set Y n = X n (t)X n (s) — E[X n (t)X n (s)]. Due the stationarity 
of the sequence {Y n } we have 

Var/V 1 J^yJ =iv- 1 (i-^)cov(yi,y 1+r ) 

V n=l J \r\<N^ ' 

and so 

/ N \ oo 

iVVar [N-^Yn < Vax(Yi) + 25}Cov(Yi,Yi +r )|. 

\ n=l / r=l 



N E\\C - C\\% = N / / Var 



dt ds. 



Setting YJT ' = Xk ' (t)X { n r ' (s) - E[X n (t)X n (s)], we obtain 

|Cov(ri,y 1+r )| = |Cov(y 1 ,y 1+r -y 1 ( ; ) r )l < [Var(y 1 )] 1 / 2 [var(y 1+r -y 1 ( ; ) r )] ] 

Consequently, NE\\C — C\\g is bounded from above by 
J J Yai[X(t)X(s)]dtds 

+2 Ey/[ var ( x w x ( s ))] i/2 

x [Vnr(X 1+r (t)X 1+r {s) - X[ r l,(t)x[ r i(s))] 1/2 dtds. 



r=l ■ 
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For the first summand we have the upper bound ff(X) because 

(A.l) J J E[X 2 (t)X 2 (s)]dtds = E J X 2 (t)dt J X 2 (s) ds = v\{X). 

To find upper bounds for the summands in the infinite sum, we use the 
inequality 

(A.2) \ab - cd\ 2 < 2a 2 {b - d) 2 + 2d 2 (a - c) 2 , 

which yields 

j J [Var(X(t)X( S ))] 1 /2[ V ar(X 1+r (t)X 1+r ( S ) - X$ r (t)X$ r {s))] 1/2 dtds 

<ff [E(X 2 (t)X 2 (s)p 2 [E(X 1+r (t)X 1+r (s) 

-X[ r l(t)x[ r l(s)) 2 f 2 dtds 

<^f f [s(x 2 (t)x 2 ( s ))] 1 / 2 [^x 2 +r (t)(x 1+r ( s )-xg r ( s )) 2 ] 1/2 dtd s 

+ V2JJ [E{X 2 {t)X 2 (s))} 1 ' 2 

x [Ex[f r (s)(X 1+r (t) - Xtl(t)f) 1,2 dtds. 

For the first term, using the Cauchy-Schwarz inequality and (A.l), we 
obtain 

\E(X 2 (t)X 2 (s))] l l 2 [EX 2 +r (t)(Xi + r{s) - X[ r hs)) 2 ] 1/2 dtds 



<v 2 A {X)\E 



1/2 



Xf +r (t) dt J (X 1+r ( S ) - X^ r (s)Y ds 

< vl{X){E\\X l+r t} l l\E\\X 1+r {s) -xg r ( S )||} 1/4 

= vl(X)v 4 (X 1 -X i (' ) ). 

The exact same argument applies for the second term. The above bounds 
imply (3.4). □ 

PROOF of Theorem 4.1. As in Giraitis et al. [29], set fi = EX and 

1 N-\j\ 

7' = N E ( Xi ~ V)( X i+\o\ ~ M). 

i=l 



Sk,i = y^i x i - At). 



i=k 
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Observe that 

7? - 7j = fl - jf) (Xn ~ + jj(XN ~ V)(Si, N -\j\ + <%|+i,jv) =: A,-. 

We therefore have the decomposition 

a 2 = E ^q{j)lj + E Uqti) 5 j °i + 
li|<9 \j\<g 

The proof will be complete once we have shown that 

oo 

(A.3) a 2 4 7, 

j=-oo 

and 

(A.4) <r 2 4o. 

We begin with the verification of the easier relation (A.4). By (4.1), 

E\6%\<b^E\5j\ 

\j\<Q 

< b £ E(X N - fif 
\j\<q 

+ ^[E(X N - ^) 2 ] 1/2 E [^UHil + %l+i^) 2 ] 1/2 - 

lil<« 

By Lemma 4.1, 

|il<Jv v 7 
Similarly E^jv^ + S , | J -| + i ) j V ) 2 = 0(N). Therefore, 

E\al\ = 0{qN~ l + N~ l N~ l ' 2 qN 1 / 2 ) = 0(q/N). 

We now turn to the verification of (A.3). We will show that Ea\ — > ^2,- 

and Var[<7 2 ] ->■ 0. 
By (4.2), 

b'|<5 J=-oo 

By (4.1), it remains to show that 

(A.5) E |Cov(7 fc ,7,)|^0. 

\k\,\l\<1 
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To lighten the notation, without any loss of generality, we assume from now 
on that jjL = 0, so that 

1 /N-\k\ N-\£\ \ 

Cov(7 fc ,^) = ^ Cov x i x i+\k\, E X i X J+W • 

V i=i j=i ) 

Therefore, by stationarity, 

1 N 

|Cov( 7fe ,^)| < ^2 E {CoviXiX^XjX^)] 
i,3=l 



If E (l-jj)\Cov(X X lk] ,X r X r+ll 



N 

\r\<N 

The last sum can be split into three terms corresponding to r = 0, r < and 
r > 0. 

The contribution to the left-hand side of (A. 5) of the term corresponding 
to r = is 

iV- 1 E \Cov(X X lk] ,X X lel )\=O(q 2 /N). 

H\t\<q 

The terms corresponding to r < and r > are handled in the same way, 
so we focus on the contribution of the summands with r > which is 

JV-l , s 

N ~* E El 1 " 77 |Cov(x X| fc |,x r x r+K |)|. 

\k\,\£\<q r=l V 7 

We now use the decompositions 

Cov(X X| fe |,X r X r+K |) = Gov(X X| fc |, jtW^J^) 

+ Cov(X A"| fc |,X r X r+ | £ | - I^I r ^| ) 

and 

Cov(X X| fc |,xWx^D) = Cov(X X^I),xW^lfl)) 

+ Cov(X (X| fc |-xflfl)),xWX r ^)). 

By Definition 2.1, Xq depends on Sq,£— i,-.- while the random variables 
_X|^ , Xf" and depend on £i, £2, . . . , £fev(r+|^|) an d errors independent 

of the Si. Therefore Cov{X X^ D , X^X^ l} ) is equal to 

E[X X^X^X^] - E[X X lk{ ]E[X^X^] 

= E[X ]E[X^XUXW®] - E[X ]E[X^][X^X^] = 0. 
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We thus obtain 

Cov(X X {k{ ,X r X r+]il ) = Cov(X (X lkl - xff),X^X^) 

+ Cov(XoX\ k \,X T X r+ \£\ - X^X^q ). 

By Assumption (4.5), it remains to verify that 
N-i 

|fc|,|*|<« r=1 

This is done using the technique introduced in the proof of Theorem 3.1. By 
the Cauchy-Schwarz inequality, the problem reduces to showing that 

N-l 

AT" 1 £ E^[^|]} 1/2 {^[(^^|-4 r) ^ D ) 2 ]} 1/2 ^0. 

\k\M\<g r=l 

Using (A. 2), this in turn is bounded by constant times 

oo 

\k\,\l\<qr=l 

which tends to zero by L 4 ~m-approximability and the condition q 2 /N — > 0. 
□ 

Proof of Proposition 4.1. We only show the first part, the second 
is similar. Let co q (h) be the Bartlett estimates satisfying Assumption 4.1. 
Without loss of generality we will assume below that the constant b in (4.1) 
is 1. Then the element in the kth row and ^th column of S(/3) — S(C/3) is 

E 9 jy - E {PknPi,n+\h\ ~ CkPknCiP£,n+\h\) 
\h\<q l<n<N-\h\ 

= E jy~ E Pkn{Pt,n+\h\ - hh,n+\h\) 
\h\<q l<n<N-\h\ 

+ E — ^AJ~ E CePt,n+\h\(Pkn - CkPkn) 
\h\<q l<n<N-\h\ 

= F 1 (N,k,£) + F 2 (N,k,i). 

For reasons of symmetry it is enough to estimate Fi(N,k,£). We have for 
any t]y > 

P(\F 1 (N ) k,£)\>e) 
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< £ p^O £ /3 fcn (/Wl - c t k, n+ \h\) > 

|ft|<g ^ l<n<7V-|h| ' 

S ^ ^ (Pe,n+\h\ - ciP e , n +\h\) > (2 1)2 ) 

\h\<g \<n<N-\h\ l<n<N-\h\ V ' J 

<{2q+l)p( Y, Pkn>N(2q + l)t N \ 
\<n<N ' 

= {2q+l)(P 1 (k,N)+P 2 (£,N)). 

By the Markov inequality and the fact that the /3/% n , 1 < n < N, are identi- 
cally distributed, we get for all k G {1, . . . , d} 

(2 q + l) Pl (k,N)<^<^l, 

which tends to zero as long as tjy — > oo. 

The estimation of P 2 (£,N) requires a little bit more effort. We notice first 
that 

(A.6) limsup^Varf £ ||Y n || 2 ) < ^Cov^Yx || 2 ,|| Y h ,|| 2 )| < oo. 

The summability of the latter series follows by now routine estimates from 
(2.3). For any x,y > we have 

\<n<N ' 

= P ( S ( [Yn(t)(ve(t)-c e v e (t))dt) >x) 
<p( Y \\Y n \\ 2 \\v e (t)-c e v e (t)\\ 2 >x) 

H<n<iV ' 

<p( Y \\Y n \\ 2 >xy] +P(\\v e (t)-cMt)\\ 2 >x/y) 

l<n<N ' 



P 2 l{N)+P 22 {t,N). 
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If we require that y > NE\\Yi\\ 2 /x, then by the Markov inequality and (A. 6) 
we have 



p 21 (iv)<,q(^-v^mil 2 ) 



for some constant K\ which does not depend on N. By Theorem 3.2 and 
again the Markov inequality there exists a constant k 2 such that for all 

£e{i,...,d} 

P 22 {l,N)<K 2 \ 

xl\ 

The x in the term P 2 (l,N) is given by 

e 2 N 

x ■ 



t N (2q+lf 
Set y = 2NE\\Y l \\ 2 /x. Then for all i G {1, . . . ,d} 

(A.7) P 2 i(N)<k 1 imill2) 2 N and P*2(e,N)< K2 t 2 N (2q + lf. 

Letting t N = (2g+l) 1 / 2 shows that under q 4 /N -)• the term (2q + l)P 2 (£,N) 
0. This finishes the proof of Proposition 4.1. □ 

A. 2. Proofs of Theorems 5.1 and 5.2. The proof of Theorem 5.1 relies 
on Theorem A.l of Aue et al. [5], which we state here for ease of reference. 

Theorem A. 2. Suppose {£ n } is a d- dimensional 1? -m-approximable 
mean zero sequence. Then 

(A.8) Ar-V2 SAr (.,04w(£)(.) ; 

where {W(£)(x),x G [0,1]} is a mean zero Gaussian process with covari- 
ances, 

Cov(W(0(z),W(0(y))=min(£,y)£(0. 
The convergence in (A.8) is in the d-dimensional Skorokhod space D^([0, 1]). 

Proof of Theorem 5.1. Let 

We notice that replacing the Lij^(x,rj) with ~Ln(x,i3) does not change the 
test statistic in (5.2). Furthermore, since by the second part of Proposition 
4.1 |S(t)) — = op(l), it is enough to study the limiting behavior of 
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the sequence Gn{x,(3). This is done by first deriving the asymptotics of 
Gm{x,P) and then analyzing the effect of replacing (3 with (3. 

Let (3™ be the m-dependent approximations for (3 i which are obtained 

by replacing Yi(t) in (4.10) by Y^ m \t). For a vector v in R d we let |v| be 
its Euclidian norm. Then 

i=i 

= J2E([(Y 1 (t)-Y} m \t))v e (t)dt 
l=\ ^ 

d 



<X> / (Y 1 (t)-Yl m \t)) 2 dt f v 2 {t)dt 
i=i J J 

= d^( Yl -Y} m) ). 



Since by Lyapunov's inequality we have v 2 {Y 1 -Y} m) ) < v^-Y^), (2.3) 
yields that Y, m >i( E \^i ~ P^?) 112 < oo. Thus Theorem A.2 implies that 

S N (x,(3) D % 1] W(f3)(x). 



'N 

The coordinatewise absolute convergence of the series S(/3) follows from 
part (a) of Theorem 4.2. By assumption the estimator E(/3) is consistent, 
and consequently 

d „ 

G N {x,P)dx D -^ ] Y^ / B 2 (x)dx 
e=i J 

follows from the continuous mapping theorem. 

We turn now to the effect of changing Gn{x,(3) to Gn(x,(3). Due to 
the quadratic structure of Gn(x,£), we have Gn(x,[3) = Gjy(x,C(3) when 
C = diag(ci, 62, . . • , q). To finish the proof it is thus sufficient to show that 

(A.9) sup -±=\S N (x,f3)-S N (x,Cp)\=o P (l) 

xe[o,i] VN 

and 

(A.10) \±(J3)-±(Cp)\ = op(l). 

Relation (A. 10) follows from Proposition 4.1. To show (A.9) we observe that 
by the Cauchy-Schwarz inequality and Theorem 3.2 

sup ±-\S N (x,f3)-S N (x,Cp)\ 2 

xG[0,l] iV 
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1 

: su p T7 



[Nx} 
n=l 



i rr J V d /■ 



Define 



(r) / rt |2a/2 



ff (i) = J E|yi(t)| 2 + 2(£;|y 1 (t)| 2 ) 1 / 2 ^( J E|y 1+J .(t)-y 1 ( ;t(t)| 

Then by similar arguments as in Section A.l we have 

2 



Hence by Menshov's inequality (see, e.g., Billingsley [13], Section 10) we 
infer that 

(k \ 2 

X>n(i)j <(loglog4iV) 2 iV 5 (t). 

Notice that (2.3) implies J g(i) dt < oo. In turn we obtain that 

(k \ 2 

|>n(i)J dt = Op((loglogiV) 2 ), 

which proves (A. 9). □ 

Proof of Theorem 5.2. Notice that if the mean function changes 
from Hi(t) to fJ-2(t) at time k* = [N9\ , then L^a;,/)) can be written as 



(ATI) 
where 



L N (x,$) + N 



0(1 -/i 2 ], ifz>0, 



Mi 



/Lii(i)i)i(i)dt, / iii{t)v 2 {t)dt,..., j m(t)v d {t)dt 



and /i 2 is defined analogously. 
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It follows from (A. 11) that Tjv(d) can be expressed as the sum of three 
terms: 

T N {d) = T 1>N (d) + T 2 , N (d) + T 3jN (d), 

where 

1 f 1 

T 1)N (d) = - J L N (x,$) T t{f ) r 1 L N {x,$)dx; 

T 2 , N (d) = ^9(1 - 0)[Ai - A2] T ^(^)" 1 [Ai - A 2 ]; 

T 3 , N (d)= [ gix^^NixJftif))- 1 ^- fi 2 ]dx, 
Jo 

with g{x, 9) = 2{x{\ - 9)I {x < e} + 9{l - x)I {x>e} }. 

Since ft in (5.4) is positive definite (p.d.), £1(77) is almost surely p.d. 
for large enough N (N is random). Hence for large enough N the term 
Ti : j\r(d) is nonnegative. We will show that N~ 1 T 2j N(d) > k± + op(l), for a 
positive constant «i, and N^T^^id) = op(l). To this end we notice the 
following. Ultimately all eigenvalues of are positive. Let X*(N) and 

\*(N) denote the largest, respectively, the smallest eigenvalue. By Lemma 
3.1, X*(N) — > X* a.s. and X*(N) — > A* a.s., where A* and A* are the largest 
and smallest eigenvalue of ft. Next we claim that 

lAi - A2I = \v>\ -A*al+op(l)- 

To obtain this, we use the relation \\vi — CjVj\\ = op(l) which can be proven 
similarly as Lemma A.l of Berkes et al. [7], but the law of large numbers 
in a Hilbert space must be replaced by the ergodic theorem. The ergodicity 
of {Y n } follows from the representation Y n = f(e n , £ n -i, ■ ■ •)• Notice that 
because of the presence of a change point it cannot be claimed that \\vi — 
CjVjW = P {N~ 1 / 2 ). 

It follows that if N is large enough, then 

[Ai - A 2 ] r £(*)r 1 [Ai - £2] > jy^lAi ~ ^2? = ^a 7 ^ 1 ~ + ° p( ^' 

To verify N~ 1 T 3 ^(d) = op(l), observe that 

sup \L N (x,0) T 'E(fi)~ 1 [fi 1 - fi 2 ]\ 

xe[o,i] 

< sup \l n (x,$)\ x ji;^)" 1 ! x i/*! - A2I 

S6[0,l] 

= op(7V)| /il - / x 2 |. 

We used the matrix norm |A| = sup| :r | <1 | Arc| and |5](r)) _1 | <oo. 
□ 
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A. 3. Proof of Theorem 6.1. We first establish a technical bound which 
implies the consistency of the estimator ai k given in (6.2). Let q = sign((t^, vi)) 
and d k = siga((u k ,u k }). 

Lemma A.l. Under the assumptions of Theorem 6.1 we have 



limsup NE\a tk - c t d k ai k \ 2 < m I + 



where k\ is a constant independent of k and I 



1 1 



Proof. It follow from elementary inequalities that 
\o~ek — ced k a£ k \ 2 < 2T\ + 2Tf , 

where 

Ti = ~n~L /fe( x i( s )^(*)-^[^( s ) y i(*)]))^( s )^(*) dt£is ; 



si=l 



N 

T 2 = [ [ E[Xi(s)Yi(t)][u k (t)v t ( S ) - d k u k {t)cm{s)} dtds. 

i=i ^ ^ 

By the Cauchy-Schwarz inequality and (A. 2) we obtain 

(N \ 2 

^XiWYiW-EiXiWYi®]) dtds; 

if = 2vl(X)vl(Y)(\\u k - d k u k \\ 2 + \\v e - c e v e \\ 2 ). 

Hence by similar arguments as we used for the proof of Theorem 3.1 we 
get NET 2 = 0(1). The proof follows now immediately from Lemma 3.2 and 
Theorem 3.1. □ 

Now we are ready to verify (6.4). We have 

K L 

1pKL(t, s) = ^ XT ^VikUkitfviis). 

fc=l l=\ 

The orthogonality of the sequences {u k } and {vg} and (6.3) imply that 

2 



/ / [^2^2\ 1(T ^Uk(t)vi(s)j dtds 

EE// \f^ k u 2 k {t)vj(s)dtds 



k>Kl>L' 



k>K£>L 
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Therefore, letting 

K L 

k=l 1=1 

(6.4) will follow once we show that 

[ipKL{t, s) - $ KL (t, s)} 2 dtds^>0 (N -»■ oo). 

Notice that by the Cauchy-Schwarz inequality the latter relation is implied 
by 

K L r r 

KL y~)y~] / [X^aekUkitfviis) - X^&ikiikitfveis)] 2 dtds -tO 
k=i i=\ J J 

(A.12) 

(N-too). 

A repeated application of (A. 2) and some basic algebra yield 
\[Xj 1 o'lkUk(t)v£(s) - A7 1 ^feU fc (t)^(s)] 2 

< Xj 2 Wik - cid k a lk \ 2 u 2 k (t)vf{s) + aj k \Xj l - Xj l \ 2 u 2 k (t)v 2 {s) 

+ cr !fe'V 2 K(*) ~ d k u k {t)\ 2 v 2 {s) + a 2 k X~ 2 \v e (s) - c e v e (s)\ 2 u 2 k (t). 

Hence 

\ J J '' [^ l <?ikU k {t)vi{s) - X^&ikUki^Viis)] 2 dtds 
< A £ 2 \<J/t k - ced k a£ k \ 2 + a 2 k \X e 1 - A £ 1 | 2 

+ a ik X £ 2 (W U k ~ d k U k \\ 2 + \\v e - C £ V e \\ 2 ). 

Thus in order to get (A.12) we will show that 

K L 

(A. 13) KL Y,Y1 ~ ^fc^fcl 2 4 0; 

k=l t=\ 

K L 

(A.14) KLYT.^ 1 ~ VT ^ 0; 

k=l 1=1 

K L 

(A.15) ia^^ £ 2 fcA - 2( |K - 4« fc || 2 + \\v t - c e v e f) 4 0. 

fe=l £=1 

We start with (A. 13). By Lemma A.l and Assumption 6.1 we have 



k=l l=\ 
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Next we prove relation (A. 14). In order to shorten the proof we replace 
&ik by (j£k- Otherwise we would need a further intermediate step, requiring 
similar arguments which follow. Now for any < e < 1 we have 



/ K L 



K 1 \ 2 >e 



fc=i i=i 



K L 



fe=i t=\ 



\i — \i 



> e 



< P ( max 

1<£<L 



> 



s 



L 
i=\ 



Xi — Xf 



x e 



> 



£ 



n |A £ — <eX £ 



i=i 



Xe — Xi 



Xi 



> 



£ 



n \X P - X f \ > e\, 



L 

i=i 



P[\M- M 2 > " e) ) + P(\Xt - X e \ 2 > e 2 X 2 ) 



< K 2 



/ KL 2 



+ 



\eNX L ' eNXj >' 



by an application of the Markov inequality and Theorem 3.2. According to 
our Assumption 6.1 this also goes to zero for N — > oo. 

Finally we prove (A. 15). By Lemma 3.2 and Theorem 3.1 we infer that 

( K L \ 
E\ KL'^2'^2aj k X e 2 (\\u k - d k u k \\ 2 + \\ve - civp\\ 2 ) 

\ k=i e=l J 

klJ^J^ 2 x _ 2 / i i\ 

k=i t=i v k £ 7 



< 2k 3 * 



KL 



Nmm{h L ,h' K } 2 ' 



Assumption 6.1(h) assures that the last term goes to zero. The proof is now 
complete. 
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